Skip to content

Comments

feat: intial hudi reg test#3641

Merged
flyrain merged 6 commits intoapache:mainfrom
rahil-c:rahil/polaris-hudi-reg-test
Feb 18, 2026
Merged

feat: intial hudi reg test#3641
flyrain merged 6 commits intoapache:mainfrom
rahil-c:rahil/polaris-hudi-reg-test

Conversation

@rahil-c
Copy link
Contributor

@rahil-c rahil-c commented Feb 2, 2026

Summary

  • Adding intial regression test for polaris hudi integration, following exist pattern set by Delta regression test
  • Made changes in run.sh and setup.sh in order to ensure that spark session can be started correctly depending on the table format.
  • Ran locally both delta regression test and hudi regression test to ensure they pass.

cc @gh-yzou @singhpk234 @flyrain

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @rahil-c ! I'm juts wondering if using shell might be an overkill for this test. Specific comment thread below.

"http://${POLARIS_HOST:-localhost}:8181/api/catalog/v1/config?warehouse=${CATALOG_NAME}"
echo
echo "Catalog created"
cat << EOF | ${SPARK_HOME}/bin/spark-sql -S \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to run this test as an Integration test under JUnit inside the Gradle builld?

Copy link
Contributor Author

@rahil-c rahil-c Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dimas-b we currently have the following integration test for polaris hudi here: #3194

In terms of the reg test, I was the following the shell pattern that @gh-yzou had done for Delta, now for Hudi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dimas-b The purpose of this regression test is to validate the end-to-end user experience when using Spark with both --packages and --jars. This is an important scenario that cannot be fully covered by integration tests.
While it is true that this test is relatively expensive, that is why it includes only very basic test cases. More complex scenarios and edge cases are covered by integration tests, which provide a more cost-effective approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Spark-based scenarios can be covered in regular CI with sufficient certainty... but I do not mean to block this PR on this point 🙂 Please consider optional.

# Define test suites to run
# Each suite specifies: test_file:table_format:test_shortname
declare -a TEST_SUITES=(
"spark_sql.sh:delta:spark_sql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about let's enforce the test file name to the format like xxx_<table_format>.sh, and have a separate folder to include all test src file and reference file. Then we just need to list the folder to get all test files, and extract the table format by parsing the file name. The benefit would be easy to onboard new tests, and developer doesn't have to input a long string when running single test (just the file name)

# this is mostly useful for building the Docker image with all needed dependencies
${SPARK_HOME}/bin/spark-sql -e "SELECT 1"
if [[ "$TABLE_FORMAT" == "hudi" ]]; then
# For Hudi: Pass --packages on command line to match official Hudi docs approach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need the if else here anymore

rm -rf /tmp/spark_hudi_catalog/

curl -i -X DELETE -H "Authorization: Bearer ${SPARK_BEARER_TOKEN}" -H 'Accept: application/json' -H 'Content-Type: application/json' \
http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs/${CATALOG_NAME} > /dev/stderr No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

@rahil-c rahil-c requested a review from gh-yzou February 15, 2026 20:51
gh-yzou
gh-yzou previously approved these changes Feb 17, 2026
exit 1
fi

parse_test_suite() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment here about what this function is doing, it is trying to extract the TABLE_ROMAT, TEST_SHORTNAME and the full path of TEST_FILE, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

exit 1
fi

# Allow running specific test via environment variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can potentially also allow running all suites for a particular format by taking table format as an argument to this script. We can probably do that in a separate PR as an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do in another pr

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Feb 17, 2026
@flyrain flyrain merged commit 893722c into apache:main Feb 18, 2026
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants